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Abstract 

We derive concentration inequalities for tlie spectral measure of large 
random matrices, allowing for certain forms of dependence. Our main fo- 
cus is on empirical covariance (Wishart) matrices, but general symmetric 
random matrices are also considered. 

1 Introduction 

In this short paper, we study concentration of the spectral measure of large 
random matrices whose elements need not be independent. In particular, we 
derive a concentration inequahty for Wishart matrices of the form X' X/m in 
the important setting where the rows of the m x n matrix X are independent 
but the elements within each row may depend on each other; see Theorem[TJ We 
also obtain similar results for other random matrices with dependent entries; see 
Theorem m Theorem [SJ and the attending examples, which include a random 
graph with dependent edges, and vector time series. 

Large random matrices have been the focus of intense research in recent 
years; see Bai Q and Guionnet [3] for surveys. While most of this litera- 
ture deals with the case where the underlying matrix has independent entries, 
comparatively little is known for dependent cases. Gotze and Tikhomirov 
show that the expected spectral distribution of an empirical covariance ma- 
trix X'X/m converges to the Marcenko-Pastur law under conditions that allow 
for some form of dependence among the entries of X. Bai and Zhou [H an- 
alyzed the limiting spectral distribution of X'X/m when the row-vectors of 
X are independent (allowing for certain forms of dependence within the row- 
vectors of X). Mendelson and Pajor [is'l considered X'X/m in the case where 
the row-vectors of X are independent and identically distributed (i.i.d.); under 
some additional assumptions, they derive a concentration result for the operator 
norm of X'X/m - E{X'X/m). Boutet de Monvel and Khorunzhy ^ studied 
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the limiting behavior of the spectral distribution and of the operator norm of 
symmetric Gaussian matrices with dependent entries. 

For large random matrices similar to those considered here, concentration of 
the spectral measure is also studied by Guionnet and Zeitouni [sj , who consider 
Wishart matrices X'X/m where the entries Xi^j of X are independent, as well 
as Hermitian matrices with independent entries on and above the diagonal, and 
by Houdre and Xu Q , who obtained concentration results for random matrices 
with stable entries, thus allowing for certain forms of dependence. For matrices 
with dependent entries, we find that concentration of the spectral measure can 
be less pronounced than in the independent case. Technically, our results rely 
on a slight extension of a result of Talagrand [14], and on McDiarmid's bounded 
difference inequality [l^ . 



2 Results 

Throughout, the eigenvalues of a symmetric n x n matrix M are denoted 
by Ai(M) < ••• < A„(M), and we write i^A/(A) for the cumulative dis- 
tribution function (c.d.f.) of the spectral distribution of Af, i.e., F/\/(A) = 
n^^ Sr=i{'^j(-^) — A e M. The integral of a function /(•) with respect to 
the measure induced by Fj^i is denoted by FM{f), i.e., 

n 

FM{f) = -J2f{X.{M}). 

i=l 

For certain classes of random matrices M and certain classes of functions /, 
we will show that Fuif) is concentrated around its expectation RFmU) or 
around any median med FM{f)- For a Lipschitz function g, we write for 
its Lipschitz constant. Moreover, we also consider functions / : (a, 6) ^ R that 
are of bounded variation on (a, b) (where — oo < a < b < oo), in the sense that 

n 

Vf{a,b) = sup sup ^ |/(xfc) - /(xfc-i)l 

n>l a<xo<xi<---<Xn<b ^ ^ 

is finite; cf. Section X.l in ^0|]. [A function / is of bounded variation on (a, b) if 
and only if it can be written as the difference of two bounded monotone functions 
on (a, 6), as is easy to see. Note that the indicator function g : a; i— > {x < A} is 
of bounded variation on M with Vg (M) = 1 for each A £ K.] 

The following result establishes concentration of Fs{f) for Wishart matrices 
S of the form S = X'X/m where we only require that the rows of X are 
independent (while allowing for dependence within each row oi X). See also 
Example [H and Example [H which follow, for scenarios that also allow for some 
dependence among the rows of X . 

Theorem 1. Let X be an m x n matrix whose row-vectors are independent, set 
S = X'X/m, and fix f -.R^m. 
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(i) Suppose that f is such that the mapping x i— > /(a;^) is convex and Lipschitz, 
and suppose that \Xi,j \ < 1 for each i and j. For each e > 0, we then have 



P{\Fs{f)-medFs{f)\>e) < 4exp 



n + m8\\fn\\l 



(1) 



[From the upper bound ([T]) one can also obtain a similar bound for ¥(\Fs{f) — 
^Psif)\ ^ e) using standard methods.] 

(ii) Suppose that f is of bounded variation on R. For each e> Q, we then have 



'(|Fs(/)~Ei^s(/)| >e) < 2exp 



m V}2(M) 



(2) 



In particular, for each A G K and each e > 0, the probability P(|_F5(A) — 
Ei^5(A)| > e) is bounded by the right-hand side of ^ with V/(R) replaced by 
1. 



The upper bounds in Theorem [T] are of the form 

P(|Fs(/)-A| >e) < SexphnC], (3) 



where A, B, and C equal med Fs{f ), 4, and me^/ ((n + ?7t-)8| |/(-^)| ||) in part (i) 
and KFsif), 2, and n2e'^ /{mVj) in part (ii) respectively. For the interesting 



case where n and m both go to infinity at the same rate, the next example 
shows that these bounds can not be improved qualitatively without imposing 
additional assumptions. 



Example 2. Let n — m = 2^ , and let X be the n x n matrix whose i-th row 
is Riv[, where Ri, . . . , i?„ are i.i.d. with P(i?i = 0) = P(-Ri = 1) = 1/2, and 
where wi, . . . , d„ are orthogonal n-vectors with Vi G { — 1, 1}" for each i. [The 
Vi 's can be obtained, say, from the first n binary Walsh functions; cf. \la].J 
Note that the eigenvalues of S = X'X/m are R\,...,R^. Set f{x) = x for 
X € {0,1}. Then nFs{f) is binomial distributed with parameters n and 1/2, 
i.e., nFs{f) ~ B(n, 1/2). By Chernoff's method (cf. Theorem 1 of [5]), we 
hence obtain that 



nFsif)-EFs{f)>e) = cxp[-n(C(e) + o(l))], (4) 

for < e < 1/2 and as n —s- oo with k — > oo, where here C(e) equals 
log(2) + (1/2 + e)log(l/2 + e) + (1/2 - e)log(l/2 - e); the same is true if 
EFsif) — Fsif) replaces Fs{f) — KFs{f) in (jlj. These statements continue 
to hold with med Fs{f) replacing KFsif), because the mean coincides with the 
median here. To apply Theorem { ^i)\ we extend f by setting f{x) — \/\x\ for 
X G K; to apply Theorem\ ^ii)\ extend f as f{x) — l{x < 1/2}. Theorem [ ||/z )\ 
and Theorem [ ^ii)\ give us that the left hand side of Q is bounded by terms of 
the form 4cxp [— nCi(e)] and 2 exp [— nC2(e)]; respectively, for some functions 
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Ci and C2 of e. It is easy to check that C {e) / C i{e) is increasing in e for i — 1,2, 
and that 

C(e) C(e) 
lini ^ , , — 32 and lim ^ , , = 1. 
ao Ci(e) eio C2(e) 

Hence, both parts of Theorem\^give upper hounds with the correct rate (—n) in 
the exponent. The constants Ci{e), i — 1,2, both are sub-optimal, i.e., they are 
too small, but the constant C2(e), which is obtained from Theorem\ ^ii)\ is close 
to the optimal constant for small e. 

Under additional assumptions on the law of X, Fs{f) can concentrate faster 
than indicated by In particular, in the setting of Theorem and for the 
case where all the elements Xij of X are independent, Guionnet and Zeitouni 
obtained bounds of the same form as ^ but with replacing n in the expo- 
nent, for functions / such that x !—> /(a:^) is convex and Lipschitz. (This should 
be compared with Example [9] below.) However, if / does not satisfy this re- 
quirement, but is of bounded variation on M so that Theorem applies , then 
the upper bound in ([2]) can not be improved qualitatively without additional 
assumptions, even in the case when all the elements Xij of X are independent. 
This is demonstrated by the following example. 

Example 3. Let X be the n x n diagonal matrix diag{Ri, . . . , Rn), where 
Ri, . . . , Rn are as in Example\^ Set f{x) = l{x < 0}. Clearly, Theorem [ ^ii)\ 
applies here so that the left hand side ([2]) is bounded by 2 exp [— nC2(e)] for C2(e) 
as in Example\^ Moreover, since for each i, f{Rf/n) = 1 — Ri, it follows that 
nFsif) ~ B{n, 1/2), and then ([4]) holds again. 

Theorem[T]can also be used to get concentration inequalities for the empirical 
distribution of the singular values of a non-symmetric n x m matrix X with 
independent rows. Indeed, the i-th singular value of Xj^fm is just the square 
root of the i-th eigenvalue of X'X/m. 

Both parts of Theorem [T] are in fact special cases of more general results 
that are presented next. The following two theorems, the first of which should 
be compared with Theorem 1.1(a) of 8], apply to a variety of random matrices 
besides those considered in Theorem [TJ some examples are given later in this 
section. 

Theorem 4. Let M be a random symmetric n x n matrix that is a func- 
tion of m independent [—1,1]^ -valued random vectors Yi,...,y,„ i.e., M = 
M{Yi, . . . ,Ym). Assume that M(-) is linear and Lipschitz with Lipschitz con- 
stant Cm when considered as a function from [—1, 1]™^ with the Euclidean norm 
to the set of all symmetric nxn matrices with the Euclidean norm on R"("+i)/2 
(we view symmetric nxn matrices as elements 0/ M"^"'*'"'^^/^ by collecting the 
entries on and above the diagonal). Finally, assume that / : R — > M is convex 
and Lipschitz with Lipschitz constant For S = M/^/m, we then have 



'i\Fsif)~medFsif)\ > e) < 4exp 



p 32Cj II f 112 



■■mwj \ \l 



(5) 
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for each e > 0. 



Theorem 5. Let M he a random symmetric n x n matrix that is a function of 
m independent random quantities Yi, . . . , Ym, i.e., M — M(Yi, . . . , Ym). Write 
M(j) for the matrix obtained from M after replacing Yi by an independent copy, 



i.e., M(,) = M{Yi, 
and independent of Yi, 
M(^ij/^/rn, assume that 



,Y^ 



i.Y*,Y, 
Ym [i 



\\Fs-Fs,, 



I ) 

,to) 



< 



where Y* 
For S 



r jn 



IS distributed as Yi 
M / ypm and S'(i) = 

(6) 



holds (almost surely) for each i = 1, . . . ,m and for some (fixed) integer r. Fi- 
nally, assume that f :]& is of bounded variation on M. For each e > 0, we 
then have 



'i\Fs{f)-EFs{f)\ > e) < 2exp 



2e' 



m rW^lR) 



(7) 



Also, if a and b, —oo < a < b < oo, are such that Pa < Xi{S) and A„(S') < b = 
1, then ([7]) holds for each function / : (a, 6) M of bounded variation on (a, b), 
where now Vf{a,b) replaces Vf(R) on the right hand side of ([7|). 



To apply Theorem [5l one needs to establish the inequality in ([6]) for each i = 
1, . . . , m. This can often be accomplished by using the following lemma, which is 
taken from Bai [2], Lemma 2.2 and 2.6, and which is a simple consequence of the 
interlacing theorem. [Consider a symmetric nx n matrix A and denote its {n — 
1) X {n—1) major submatrix by B. The interlacing theorem, a direct consequence 
of the Courant-Fishcr formula, states that Xi{A) < Xi{B) < Ai+i(A) for i = 

Lemma 6. Let A and B be symmetric nxn matrices and let X and Y be mxn 
matrices. Then the following inequalities hold: 

, , „ „ , , rank(A — B) 
\\Fa~Fb\\oo < ^— 

and 

rankjX-Y) 

WJ'x'x — -fyrWoo S ■ 

n 

We now give some examples where Theorem|3]or Theorem[5]can be applied, 
the latter with the help of Lemma [HI 

Example 7. Consider a network of, say, social connections or relations between 
a group of n entities that enter the group sequentially and that establish connec- 
tions to group members that entered before as follows: For the i-th entity that en- 
ters the group, connections to the existing group members, labeled I, . . . ,i—l, are 
chosen according to some probability distribution, independently of the choices 
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made by all the other entities. Denote the nxn adjacency matrix of the resulting 
random graph by M , and write Yi for the n-vector (Af^^i, Mi^2, ■ ■ ■ , Mi^i, 0, . . . , 0)' 
for i = 1, . . . , n. By construction, Yi, . . . ,Yn are independent and M (when con- 
sidered as a function ofYi,...,Yn as in Theorem^ is linear and Lipschitz with 
Lipschitz constant 1. Hence Theorem [7] is applicable with m — p — n and 
Cm = 1- 

Theorem\^ can also be applied here. To check condition ([6]), write for 
the matrix obtained from M by replacing Yi by an independent copy denoted 
by Y* as in Theorem\^ Clearly, the i-th row of the matrix M — -^^(i) equals 
Si — (Yi^i — Y*^, . . . Yi^i — Y*^,0, . . . , 0), the i-th column of M — M^i^^ equals S'i, 
and the remaining elements of M — M^i^ all equal zero. Therefore, the rank of 
M — M(^i^ is at most two. Using Lemma\^ we see that Theorem\5\is applicable 
here with r — 2 and m — n. 



The following two examples deal with the sample covariance matrix of vec- 
tor moving average (MA) processes. For the sake of simplicity, we only consider 
MA processes of order 2. Our arguments can be extended to also handle MA 
processes of any fixed and finite order. In Example [51 we consider an MA (2) 
process with independent innovations, allowing for arbitrary dependence within 
each innovation, and obtain concentration inequalities of the form ^ . In Exam- 
ple [HI we consider the case where each innovation has independent components 
(up to a linear function) and obtain a concentration inequality of the form ^ 
but with replacing n in the exponent. 

Example 8. Consider anmxn matrix X whose row-vectors follow a vector MA 
process of order 2 i.e., (ATj.)' — Yi+i-\-BYi fori — 1 ... to, where Yi, . . . Ym+i are 
m-\-l independent n-vector s and B is some fixed nxn matrix. Set S = X' X/m. 

(i) Suppose that f is such that the mapping x ^ fix^) is convex and Lipschitz, 
and suppose that Yi £ [—1, 1]" for each i — l,...,TO-fl. For each e > 0, we 
have 



Pi\Fsif)-medFsif)\>e) < 4exp 



(TO + l)(n + TO) 8C|||/(- 



(8) 



Here Cb equals 1 -|- ||-B||, where \\B\\ is the operator norm of the matrix B. 
(a) Suppose that f is of bounded variation on K. For each e > 0, we then have 



'i\Fs{f)-EFs{f)\>e) < 2exp 



m+12V^{ 



(9) 



The proofs of ([5]) and © follow essentially the same argument as used in the 
proof of Theorem]^ using the particular structure of the matrix X as considered 
here. 
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Example 9. As in Example\^ consider an m x n matrix X whose row-vectors 
follow a vector MA(2) process (-^i, )' = i^i+i + BYi for some fixed n x n matrix 
B , i = 1, . . . ,m. For the innovations Yi, we now assume that Yi — UZi, where 
U is a fixed n x n matrix, and where the Zi^j, i = 1, . . . , m + 1, j — 1, . . . , n, 
are independent and satisfy \Zij\ < 1. Set S = X'X/m. For a function f such 
that the mapping x ^ fix^) is convex and Lipschitz, we then obtain that 



'{\Fs{f)~medFs{f)\>e) < 4exp 



n + m 8C2||/(.2)||^ 



(10) 



for each e > 0, where C is shorthand for C = (1 + \ \U\\ with \\B\\ and 

\\U\\ denoting the operator norms of the indicated matrices. The relation (jlOp 
is derived by essentially repeating the proof of Theorem and by employing 
the particular structure of the matrix X as considered here. 

We note that the statement in the previous paragraph reduces to Corollary 
1.8(a) in '0.] if one sets B to the zero matrix and U to the identity matrix. 
Moreover, we note that Theorem can also be applied here ( similarly to Exam- 
plelWuJy, but the resulting upper bound does not improve upon ([9]). 



A Proofs 

We first prove Theorem |4] and Theorem [5] and then use these results to deduce 
Theorcm[TJ The proof of Theorem[3]is modeled after the proof of Theorem 1 . 1 (a) 
m Guionnct and Zcitouni It rests on a slight modification of Theorem 6.6 
of Talagrand [3] that is given as Theorem [TOl below, and also on Lemma 1.2 
from Guionnet and Zeitouni [S] that is restated as Lemma [TTl which follows. 

Theorem 10. Fix m > 1 and p > 1. Consider a function T : [—1, 1]™^ — > M. 
that is quasi-conve^ and Lipschitz with Lipschitz constant a. Let Yi, . . . , Y^ be 
independent p-vectors, each taking values in [—1,1]^ and consider the random 
variable T — r(Yi, . . . , Y,n). For each e > 0, we then have 

Fi\T-medT\>e)<Aexp( \^^Y (11) 

\ pCT^ 16 / 

The above theorem follows from Theorem 6.1 of Talagrand [l3| by arguing 
just like in the proof of Theorem 6.6 of Talagrand fl3|, but now using [—1, 1]'' 
instead of [—1, 1]. When p = I, Theorem [TOl reduces to Theorem 6.6 of Tala- 
grand [li |. 

Lemma 11. Let denote the set of all real symmetric n x n matrices and let 
M : K — > M be a fixed function. Let us denote by A" the functional A i-^ Fa (u) 
on A". Then 

^ A real valued function T is said to be quasi-convex if all the level sets {T < a} ,a £ M., 
are convex. 
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(i) If u is convex, then so is A". 

(ii) If u is Lipschitz, then so is A" (when considering with the Euclidean 
norm on R"("+i)/2 hy collecting the entries on and above the diagonal). More- 
over, the Lipschitz constant of satisfies 

\\K\\l<^\\u\\l- 

Remark 12. For a proof of this lemma, see Guionnet and Zeitouni '8, Proof of 
Lemma 1.2]. A simpler proof (along with other similar results) of Lemma \l^](i)\ 
can be found in Lieb and Pedersen {1 J /. 



Proof of Theorem^ Set T = Fs{f ) and let be as in Lemma [TT] In 
view of Theorem [TOl it sufRces to show that T = T{Yi,...,Ym) is such 
that the function r(-) is quasi-convex and Lipschitz with Lipschitz constant 
< {2/{nm)y/^CM\\f\\L- To this end, we write T as the composition T2 o Ti, 
where Ti : ([-1,1]^)" ^ A" and T2 : A" R denote the mappings 
(2/1, • ■ • , Uni) 1-^ M{yi, ym)l\fm and A ^ FA{f), respectively. By assump- 
tion, Ti is linear and Lipschitz with ||Ti||l = Cm/\/^- Also, since / is as- 
sumed to be convex and Lipschitz, Lemma [Tl] entails that T2 is convex and 
Lipschitz with ||T2||l < (2/n)i/2| |/| it follows that T is convex (and hence 
quasi-convex) and Lipschitz with < (2/(nm))^/^CAf 1 1/| |l. The proof is 

complete. □ 



To prove Theorem [SJ we recall McDiarmid's bounded difference inequality 
see also Proposition 12 in 3]): 

Proposition 13. Consider independent random quantities Yi, . . . ,Y„i, and a 
(measurable) function Z = f(Yi, . . . ,Ym)- For each i = l,...,m, define 
Z(^i^ like Z, but with Yi replaced by an independent copy; that is, Z(^i^ = 
/(Yi, . . . , Yi^i, Y*, li+i, . . . , Ym), where Y* is distributed as Yi and independent 

0fYi,...,Yra. If 

I Z ~ I < Ci 

holds (almost surely) for each i = l,...,m, then, for each e > 0, both 

F{Z -EZ>e) and ¥{Z -EZ < -e) are bounded by exp [-2eV J2T=i cf ] ■ 

Proof of Theorem O It suffices to prove the second claim. Hence assume that 
a and b, —00 < a < b < 00 are such that P (a < Xi{S) and A„(S') < 6) = 1 and 
that / : (a, 6) — > R is of bounded variation on (a, b). We shall now show that 

\Fsif) - Fs,^,{f)\ < rVf{a,b)/n (z = l,...,m). (12) 

With this, we can use the bounded difference inequality, i.e., Proposition I13|, 
with Z, Z(^i), and a {1 < i < m) replaced by Fs{f), Fs^-^if), and rVf{a,b)/n, 
respectively, to obtain ([7]), completing the proof. 
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To obtain set G(A) — Fs{X) — Fs^-^{X) and choose a and /9 satisfying 
a < a < min{Ai(S'), Ai(S'(i))} and b > (3 > max{A„(S'), A„(5(i))}. With these 

choices, we can write Fs{f) — Fs^--^ (/) as the Riemann-Stieltjes integral fdG. 
In particular, we have 



Fsif) - Fs,^,{f) 













/ fdG 




f Gdf 



< \\G\\o.Vf{a,b), 



where the second equality is obtained through integration by parts upon noting 
that G{a) ^ G{l3) = 0. By assumption, HGHoo = \\Fs - Fs^^^\\oo < r/n, and 
follows. ' □ 



Proof of Theorem Ql Our reasoning is similar to that used in the proof of Corol- 
lary 1.8 of Guionnet and Zeitouni Set n — m + n and write M as shorthand 
for n X h matrix 

Orixri 



M 



VI 

ny.m 



Moreover, set S = M j^pm^ and write Yi for the i-th row of X, 1 < i < m, i.e., 
Yi = {X,,)'. We view M as a function of Fi, . . . ,K„,. Also let j{x) = f (x^). 



It is easy to check that 



n 



-/(o), 



and hence 
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\Fs{f)-f>\>^^ 
n 



Fi\Fsif)-fi\>e) = 
where fi (/i) can be either EFs{f) OS^Fgif)) or med Fs{f) (med F§{f)). 



To prove (i) it suffices to note that Theorem 0] applies with M, S, n, n, /, 
and 1 replacing M, 5*, n, p, /, and Cm, respectively. Using Theorem 2] with 
these replacements and with replacing e, we see that the left hand side of |T]) 
is bounded as claimed. 



To prove (ii) we first note that \\Fg — Fg^i) ||oo < 2/n in view of Lemma[6] 
(where S^'*' is defined as S but with Yi replaced by an independent copy). 
Also, note that / is of bounded variation on M with Vj?(R) < Vf (R). Hence, 
Theorem [S] applies with M, S, h, r and / replacing Af, 5, n, 2 and / respectively 
and (21) follows after elementary simplifications. □ 
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